There is a need for a partially automated approach to detecting outliers in animal tracking data.

Isolation forest is an efficient method of anomaly detection for high dimensional data. Data can be of any type and require no scaling. It grows random decision trees, picks a feature at random, and using a random threshold value attempts to partition the data set. As the data get increasingly partitioned, each observation gets isolated, and observations that are easiest to isolate are identified as anomalous.

Observations are given an anomaly score between 0 and 1, values at .5 and below are not anomalous, and values approaching 1 are increasingly anomalous.

Liu, F. T., Ting, K. M., & Zhou, Z. H. (2012). Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 6(1), 1-39.

When applied to movement data, I propose a workflow where speed, distance from median location, and (when relevant) height are evaluated by isolation forest for anomaly detection in animal tracking data.

The above plot is example data from a kinkajou. The data were collected every 4 minutes in 6 second bursts. The only processing done on this dataset for this example is filtering it down to only the last fix of each burst.

I calculated speed, distance from the median location for each fix, and the vertical component of the tag data. I then applied Isolation forest four times, once as a multinimensional case including all metrics, and once with each metric as single dimensions.

Because I am interested in bad fixes, and not behavioral outliers, I am only considering the fixes with anomaly scores in the 99.5% quantile of the distribution of anomaly scores or higher.

The following plot is an example zooming into a potential outlier of interest in order to see the behaviors before and after.

Isolation forest seems to do a reasonable job identifying aggregious outliers. In this example, out of 15965 fixes, there were 80 unanimously identified as outliers. I propose automatically flagging unanimously detected anomalies as outliers, and leaving outliers only detected by a single dimension as points requiring manual inspection.

To better evaluate the ability to detect “true” outliers, I simulated twenty random movement tracks from an OUF process and subsampled them to a 5 min sampling rate. Each track consisted of 52562 gps fixes. I added realistic noise to each simulated fix to resemble common error from GPS collars. Errors were drawn from a negative binomial distribution with a mu of 8 and a size of 2.5. Outliers were created at a .2% rate (105 outliers per track) and were generated from a gamma distribution with a shape of 8 and a rate of .005.

I evaluated the consequence of the confidence threshold on the synthetic tracks, testing the outlier detection performance across confidence values of .995 to .999. Below is a plot showing the accuracy, precision, recall, F1 Score, and specificity accross confidence values.

Further evaluation is necessary across different error distributions, however these preliminary assessments suggest that specificity is generally high, and the risk of false negatives is low across these confidence values. If more trust is willing to be placed on the algorithm and a low rate of false positives is acceptable, then I recommend setting the confidence threshold to .996. If false positives are unacceptable and need to be minimized, and more reliance is willing to be placed on manual inspection to catch false negatives, then set confidence to .998.

Below is a clarification on the (temporary) nomenclature and calculations visualized above.

Calculation of performance metrics

Unanimous outliers – Outliers that are unanimously anomalous across all dimensions

Soft outliers – Points that are anomalous in only one dimension

True positive: Outliers that I created that were labeled unanimous outliers

Soft positive: Outliers that I created that were labeled soft outliers

False positive: Points that are not outliers that were labeled unanimous outliers

Soft false positive: Points that are not outliers that were labeled soft outliers

True negative: Points that are not outliers that were labeled not anomalous

False negative: Outliers that I created that were labeled not anomalous

Accuracy tells us how many outliers we correctly identified

\[ Accuracy = \frac{\mbox{True positive}+\mbox{True negative}}{\mbox{True positive}+\mbox{True negative}+\mbox{False positive}+\mbox{False negative}} \]

\[ \mbox{Accuracy_soft} = \frac{\mbox{True positive}+\mbox{Soft positive}+\mbox{True negative}}{\mbox{True positive}+\mbox{Soft positive}+\mbox{True negative}+\mbox{False positive}+\mbox{Soft false positive}+\mbox{False negative}} \]

Precision tells us how many points we labeled as outliers were actually outliers

\[ Precision = \frac{\mbox{True positive}}{\mbox{True positive}+\mbox{False positive}} \]

\[ \mbox{Precision_soft} = \frac{\mbox{True positive}+\mbox{Soft positive}}{\mbox{True positive}+\mbox{Soft positive}+\mbox{False positive}+\mbox{Soft false positive}} \]

Recall is also sensitivity. How many real outliers did we correctly recognize as outliers

\[ Recall = \frac{\mbox{True positive}}{\mbox{Total positive}} \]

\[ \mbox{Recall_soft} = \frac{\mbox{True positive}+\mbox{Soft positive}}{\mbox{Total positive}} \]

F1 Score harmonic mean of the precision and recall. F1 is preferred over accuracy if there is an uneven class distribution. Captures balance between precision recall.

\[ \mbox{F1 Score} = \frac{2*(Recall * Precision)}{Recall + Precision} \]

\[ \mbox{F1 Score soft} = \frac{2*(\mbox{Recall soft} * \mbox{Precision soft})}{\mbox{Recall soft} + \mbox{Precision soft}} \]

Specificity is the correctly labeled outliers relative to all real outliers, i.e. how many of real outliers did we correctly predict?

\[ Specificity = \frac{\mbox{True negative}}{\mbox{Total negative}} \]

Prioritize Recall if false negatives are unacceptable. Prioritize precision if you’d rather tolerate false positives than miss true positives. Prioritize specificity if you don’t want to tolerate false positives.